Lexical Micro-adaptation in Statistical Machine Translation
نویسندگان
چکیده
We introduce a generic framework in Statistical Machine Translation (SMT) in which lexical hypotheses, in the form of a target language model local to the input sentence, are used to guide the search for the best translation, thus performing a lexical microadaptation. An instantiation of this framework is presented and evaluated on three language pairs, where these auxiliary hypotheses are derived through triangulation via an auxiliairy language. Our first experiments consider nine auxiliary languages, allowing us to measure their individual contribution. We then combine all their hypotheses through a decoding by consensus. Our experiments show that SMT systems can be improved by automatically produced auxiliary hypotheses. MOTS-CLÉS : traduction automatique statistique, traduction par pivot.
منابع مشابه
A Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملIntegration of Machine Translation in On-line Multilingual Applications – Domain Adaptation
Large amounts of bilingual corpora are used in the training process of statistical machine translation systems. Usually a general domain is used as the training corpus. When the system is tested using data from the same domain, the obtained results are satisfactory, but if the test set belongs to a different domain, the translation quality decreases. This is due to insufficient lexical coverage...
متن کاملExploring cross-language statistical machine translation for closely related South Slavic languages
This work investigates the use of crosslanguage resources for statistical machine translation (SMT) between English and two closely related South Slavic languages, namely Croatian and Serbian. The goal is to explore the effects of translating from and into one language using an SMT system trained on another. For translation into English, a loss due to cross-translation is about 13% of BLEU and ...
متن کاملTo Cache or Not To Cache? Experiments with Adaptive Models in Statistical Machine Translation
We report results of our submissions to the WMT 2010 shared translation task in which we applied a system that includes adaptive language and translation models. Adaptation is implemented using exponentially decaying caches storing previous translations as the history for new predictions. Evidence from the cache is then mixed with the global background model. The main problem in this setup is e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- TAL
دوره 51 شماره
صفحات -
تاریخ انتشار 2010